Can we tell the author of a message, without reading the message? This work tackles authorship analysis through features that ignore the explicit content of a contribution - informally, those that can be computed even if every character in the body of a message (but not metadata such as timing or \likes") is replaced by an X. Focusing on forum posts, we distil a case-study set of these content-agnostic features, and prove its viability for authorship verification and attribution, using data from four online forums (of different size, language, and topic). A simple classification testbed, relying exclusively on content-agnostic features, confirms the author of a message with 76% accuracy, and discriminates between two candidate authors with 94% accuracy. Being able to re-identify a user without looking at the content of her contributions poses a serious threat to common data anonymization practices.

Content attribution ignoring content / Samory, M.; Peserico, E.. - (2016), pp. 233-243. (Intervento presentato al convegno 8th ACM Web Science Conference, WebSci 2016 tenutosi a Hannover, DE) [10.1145/2908131.2908156].

Content attribution ignoring content

Samory M.;
2016

Abstract

Can we tell the author of a message, without reading the message? This work tackles authorship analysis through features that ignore the explicit content of a contribution - informally, those that can be computed even if every character in the body of a message (but not metadata such as timing or \likes") is replaced by an X. Focusing on forum posts, we distil a case-study set of these content-agnostic features, and prove its viability for authorship verification and attribution, using data from four online forums (of different size, language, and topic). A simple classification testbed, relying exclusively on content-agnostic features, confirms the author of a message with 76% accuracy, and discriminates between two candidate authors with 94% accuracy. Being able to re-identify a user without looking at the content of her contributions poses a serious threat to common data anonymization practices.
2016
8th ACM Web Science Conference, WebSci 2016
Attribution; Authorship; Forum; Identification; Privacy; Social; Structural features; Timing
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Content attribution ignoring content / Samory, M.; Peserico, E.. - (2016), pp. 233-243. (Intervento presentato al convegno 8th ACM Web Science Conference, WebSci 2016 tenutosi a Hannover, DE) [10.1145/2908131.2908156].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1655758
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
social impact